[
  {
    "path": ".gitignore",
    "content": "*~\n"
  },
  {
    "path": "LICENSE",
    "content": "MIT License\n\nCopyright (c) 2021 Amir Gholami\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\nof this software and associated documentation files (the \"Software\"), to deal\nin the Software without restriction, including without limitation the rights\nto use, copy, modify, merge, publish, distribute, sublicense, and/or sell\ncopies of the Software, and to permit persons to whom the Software is\nfurnished to do so, subject to the following conditions:\n\nThe above copyright notice and this permission notice shall be included in all\ncopies or substantial portions of the Software.\n\nTHE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\nIMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\nFITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\nAUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\nLIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\nOUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\nSOFTWARE.\n"
  },
  {
    "path": "README.md",
    "content": "# Memory Footprint and FLOPs for SOTA Models in CV/NLP/Speech\n\nThis is a repository with the data used for the [AI and Memory Wall paper](https://arxiv.org/pdf/2403.14123.pdf). We report the number of paramters, feature size, as well as the total FLOPs for inference/training for SOTA models in CV, Speech Learning, and NLP. \n\n\n## NLP Models\nWe mostly focus on calculating the different metrics for transformer models, starting from the original BERT FLOPs for training/inference, as well as its parameters and memory footprint. We then calculate the same metrics for different BERT variations as reported in the table below.\n\nP.S: The total PFLOPs required to train each model, is calculated by using the setup reported in each paper.\n\n\n|    Date    |      Model      | Token Size |   #Params   | #Features | Inference GFLOPs | Training PFLOPs |\n|------------|-----------------|------------|-------------|-----------|------------------|-----------------|\n| 09/10/2014 | Seq2Seq         |            |             |           |                  | 11,000          |\n| 12/06/2017 | Transformer     | 512        | 65M         | 77M       | 54               | 23,000          |\n| 02/15/2018 | ELMo            |            | 94M         |           |                  | 3,300           |\n| 10/11/2018 | BERT Large      | 512        | 330M        | 230M      | 340              | 250,000         |\n| 06/11/2018 | GPT-1           | 512        | 110M        | 85M       | 96               | 57,000          |\n| 02/14/2019 | GPT-2           | 1024       | 1,500M      | 2,000M    | 3,400            |                 |\n| 07/26/2019 | RoBERTa Large   | 512        | 1,500M      | 2,000M    | 3,400            | 4,300,000       |\n| 08/17/2019 | Megatron        | 1024       | 8,300M      | 4,700M    | 18,000           | 8,100,000       |\n| 09/26/2019 | ALBERT xxl      | 512        | 235M        | 450M      | 2,500            | 31,000,000      |\n| 02/13/2020 | Microsoft T-NLG | 1024       | 17,000M     | 5,700M    | 36,000           | 28,000,000      |\n| 03/23/2020 | ELECTRA Large   | 128        | 330M        | 38M       | 79               | 3,100,000       |\n| 05/28/2020 | GPT-3           | 2048       | 175,000M    | 63,000M   | 740,000          | 310,000,000     |\n| 06/30/2020 | GShard          |            | 600,000M    |           |                  |                 |\n| 06/20/2020 | Baidu RecSys-C  | N/A        | 2,000,000M  | N/A       | ~O(0.1)          | N/A             |\n| 06/20/2020 | Baidu RecSys-E  | N/A        | 10,000,000M | N/A       | ~O(0.1)          | N/A             |\n\n\n## CV Models\nThe table below reports the different metrics for various SOTA vision models, including the input image resolution, the number of parameters, the total inference GFLOPs, as well as the total PFLOPs required to train each model.\n\n|    Date    |       Model       | Input Resolution | #Params | Inference GFLOPs | Training PFLOPs |\n|------------|-------------------|------------------|---------|------------------|-----------------|\n| 06/01/2012 | AlexNet           | 227 x 227        | 61M     |              1.4 | 460             |\n| 09/04/2014 | VGG-19            | 224 x 224        | 138M    |               39 | 11,000          |\n| 12/02/2015 | InceptionV3       | 299 x 299        | 24M     |              5.7 | 100,000         |\n| 12/10/2015 | ResNet152         | 224 x 224        | 55M     |               23 | 11,000          |\n| 02/26/2016 | InceptionV4       | 299 x 299        | 82M     |             24.6 |                 |\n| 10/07/2016 | Xception          | 299 x 299        | 23M     |               17 | 450,000         |\n| 11/16/2016 | ResNeXt101(64x4d) | 224 x 224        | 83M     |               31 | 12,000          |\n| 12/03/2016 | DenseNet201       | 224 x 224        | 20M     |              8.9 | 2,800           |\n\n\n# Memory Breakdown\nThe table below report the breakdown of memory required to train different SOTA models throughout the years. These include the total memory required to store the parameters, the memory footrpint associated with the optimization algorihtm, as well as the activation/feature memory.\n\n| Year |         Model         | Input Resolution (Sentence length) | Batch Size | Params Memory | Optimizer Memory | Activation Memory | Total Memory |\n|------|-----------------------|------------------------------------|------------|---------------|------------------|-------------------|--------------|\n| 2012 | AlexNet               | 227 x 227                          |        128 | 0.23 GB       | 0.23 GB          | 0.71 GB           | 1.71 GB      |\n| 2014 | VGG19                 | 224 x 224                          |         64 | 0.54 GB       | 0.54 GB          | 4.64 GB           | 5.72 GB      |\n| 2015 | ResNet152             | 224 x 224                          |         32 | 0.22 GB       | 0.22 GB          | 5.14 GB           | 5.58 GB      |\n| 2016 | DenseNet201           | 224 x 224                          |         32 | 0.07 GB       | 0.07 GB          | 6.04 GB           | 6.18 GB      |\n| 2016 | ResNeXt101 (64x4d)    | 224 x 224                          |         32 | 0.31 GB       | 0.31 GB          | 7.34 GB           | 7.96 GB      |\n| 2017 | Transformer Big (WMT) | 512                                |          6 | 1.02 GB       | 2.04 GB          | 11.78 GB          | 14.84 GB     |\n| 2018 | BERT Large            | 512                                |         16 | 1.32 GB       | 2.64 GB          | 14.38 GB          | 18.34 GB     |\n| 2019 | GPT-2                 | 2014                               |          1 | 5.86 GB       | 11.62 GB         | 8.63 GB           | 26.21 GB     |\n\n\n# Acknowledgments\nWe appreciate it if you would please cite the following paper if you found the library useful for your work:\n\n```text\nGholami A, Yao Z, Kim S, Mahoney MW, Keutzer K. AI and Memory Wall. RiseLab Medium Blog Post, University of Califonia Berkeley, 2021, March 29.\n```\n\n```text\n@article{gholami2020ai_and_memory_wall,\n  title={AI and Memory Wall},\n  author={ Gholami, Amir and Yao, Zhewei and Kim, Sehoon and Hooper, Coleman and Mahoney, Michael W, and Keutzer, Kurt},\n  journal={IEEE Micro Journal},\n  year={2024}\n}\n```\n"
  }
]